<html>
<head><title>BOXER XML tags</title>
<body>

<h1>BOXER XML tags (2009-05-01 version)</h1>


<p>
Proposed reorganization of BOXER XML format, in response to David's
2009-01-22 API design plan (later referred to as DL090122)
</p>

<table border=1>
<tr>
<td colspan=3 align=center><strong>Top-level elements.</strong><br>
(These elements may appear as root element of an XML file; they may be passed to
publicly exposed methods of the BOXER API)</td>
</tr>
<tr><th>Element's tag</th><th>Content</th><th>Usage</th>
<th>Attrributes</th>
</tr>

<tr>
<td><tt>discrimination</tt></td>
<td>A Discrimination: list of classes, plus optional properties of this
discrimination</td>
<td>
Passed into the public method <tt>Suite.addDiscriminations(Element
DiscriminationsDefinition)</tt>. Elsewhere, used as part of
a <tt>suite</tt> element.
</td>
<td>defaultclass="classname" leftoversclass="classname" fallback="true"
classstructure="Bounded"  maxnumberofclasses="10"

<br>
<em>Note: the fallback discrimination can only appear as the first discrimination within a <tt>suite</tt> element.
</td>
</tr>

<tr>
<td><tt>suite</tt></td>
<td>Contains description of a suite:
<ul>
<li>parameters controlling suite behavior on addition of new discriminations or classes
<li>list of <tt>discrimination</tt> elements (optional)
</td>
<td>Creating a new suite from scratch, in the Suite(Element) constructor (corresponds to "Suite.SetCharacteristics(SuiteDefinition)" in DL090122)
</tr>

<tr>
<td><tt>learner</tt></td>
<td>Contains:<ul>
<li>Algorithm name (e.g. boxer.TruncatedGradient) and parameter values (e.g. ''theta'' etc).
<li>Optionally, complete internal state of the algorithm. (If not provided, the algorithm state will be initialized with zeros (or whatever the defaults are for the algorithm in question))
</ul>
Does <em>not</em> contain the description of the Suite and
FeatureDictionary; this means that the "learner" element will be usually either enclosed into a "learnercomplex" (which contains those)
</td>
<td><ul>
<li>Passed to Suite.AddLearner(Configuration);
<li>Can be written/read as part of a <tt>learnercomplex</tt> element, deserializing/serializing the complete state of a set of learners on the same suite.
</ul>
A <tt>learner</tt> element without internal state can be used to create a Learner from scratch, its matrices being initialized with default values (e.g. zeros)
</td>
</tr>


<tr>
<td><tt>learnercomplex</tt></td>
<td>Contains:<ul>
<li><tt>suite</tt> (Suite description)
<li><tt>features</tt>(Feature dictionary)
<li><tt>learners</tt> element, which encloses one or more <tt>learner</tt> elements
</ul>
</td>
<td>
This can be taken by Suite.Deserialize(Element), creating a suite with an array of Learners based on it.
</td>
</tr>
<tr>
<td><tt>model</tt></td>
<td>Contains:
<ul>
<li><tt>suite</tt> (Suite description)
<li><tt>features</tt>(Feature dictionary)
<li><tt>modeldata</tt> element, containing the <tt>type</tt>
attribute, which will specify how the model is to be interpreted. The
fallback type is "learner", which means that the <tt>modeldata</tt> is
implemented merely as a wrapper around the <tt>learner</tt> element
(which, in its turn, will contain the state of a learner for one or
all discriminations). Other model types (e.g., a sparse PLRM matrix)
can be added later on as well.
</ul>
</td>
<td>
Produced by Suite.EmitModel(DiscriminationName, LearnerName) and
Suite.EmitModelSuite(LearnerName). The difference between the two is that the former only contains data pertaining to one discirmination.
<br>

To deserialize and use the model, class Model with its method
Model.deserializeModel() will be used. This class will be responsible
for instantiating the model as an instance of an appropriate subclass,
responsible for applying the model. The "default" subclass,
LearnerWrapperModel (instantiated when a default-type "model" is
read), will simply create a "captive" Learner inside itself, which
will be the deserialization of whatever Learner state was wrapped into
the model. Then LearnerWrapperModel will piggyback on the Learner's
ApplyModel method.<br>

Naturally, other (more streamlined and efficient) model formats may be
added later on.
</td>
</tr>

<tr>
<td><tt>datapoint</tt></td>
<td>List of feature:value pairs describing a datapoint, and, optionally, a list of class labels describing its assignments to classes</td>
<td>Accepted by Learner.absorbExample(Element) (though if one foresees
reusing the data point, it makes more sense to convert it into a
DataPoint instance first). More importantly, it is used as the main
building block of the <tt>dataset</tt> element.
</td>
</tr>

<tr>
<td><tt>dataset</tt></td>
<td>List of <tt>datapoint</tt> elements
</td>
<td>Can be read by  ParseXML.parseDatasetElement(), to produce a Vector (a Java array-like structure) of DataPoints.
</td>
</tr>

</table>

<br></br>

<table border=1>
<tr>
<td colspan=3 align=center><strong>Lower-level element.</strong><br>
(These tags are always enclosed by other elements, and are not directly used by
publicly exposed methods of the BOXER API)</td>
</tr>
<tr><th>Element's tag</th><th>Content</th><th>Usage</th></tr>

<tr>
<td><tt>features</tt></td>
<td>A feature dictionary (list of features)</td>
<td>Deserializing/serializing the feature dictionary. This is nothing more
than the list of all features that BOXER has seen in input example
files, and to which e.g. the rows of PLRM matrices correspond.
</tr
>

<tr>
<td><tt>classifier</tt></td>
<td>A section of a learner's data (coefficient matrices etc) pertaining to a particular discriminations</td>
<td>Enclosed in a <tt>learner</tt> element
</tr>

<tr>
<td><tt>matrix</tt></td>
<td>A sparse matrix, whose rows correspond to features and columns, to classses
</td>
<td>Typically used e.g. as a component of <tt>learner</tt> elements representing PLRM-based learning algorithms. May also be directly used as a component of a <tt>model</tt> element.
</tr>
</table>

<hr>
<p>
Back to <a href="boxer-user-guide.html">BOXER User Guide</a>


</body>
</html>
